Íðøöö¹ëùñññööþþøøóòò Ëøøøø×øø Blockin Blockinð Ôôöóó Øó Òòööøøòò Ààààðý Óòòòò××× Aeóò¹¹üøöö Blockinøøú Ëùñññööö×

نویسنده

  • Vibhu O. Mittal
چکیده

Highly Condensed Non-Extra tive Summaries Mi hael J. Witbro k Vibhu O. Mittal Just Resear h 4616 Henry Street, Pittsburgh, PA 15213 mwitbro k ly os. om mittal justresear h. om 1 Introdu tion Summarization is one of the most important apabilities required in writing. Most previous work on omputational summarization has been on extra tive summarization, where text spans, usually senten es, are sele ted for a summary. This approa h has several drawba ks, in luding the inability to generate e e tive summaries shorter than a senten e. This is problemati when short \headline" style summaries with only a few words are desired be ause (1) senten es with summary ontent are a tually usually longer than average [2℄, and (2) information in the do ument is often s attered a ross multiple senten es; extra tive summarization annot ombine on epts in di erent text spans of the sour e do ument without using the whole spans. We des ribe an alternative approa h to summarization, not based on senten e extra tion, apable of generating summaries of any desired length: it does so by statisti ally learning models of both ontent sele tion and realization. Training this model on headline style summaries, we have su essfully evaluated this system on this task. This paper dis usses, very brie y, the framework, our experiments, and some of the advantages and novel appli ations of this alternative approa h to summarization. 2 System Design and Operation A high-level view of the system is shown in Figure 1. The training orpus used in our experiments onsisted of newswire arti les from Reuters and the Asso iated Press available from the LDC. The data-set onsisted of approx. 25,000 news stories; the asso iated headlines were the target summaries to be learned. Pre-pro essing in luded ltering do uments for extraneous ontent, bylines, et . Tokenisation urrently in ludes only ontiguous hara ter sequen es, not in luding pun tuation, but in prin iple, may in lude additional information su h as part of spee h (POS) tags, semanti tags applied to words, even phrases. The system learns statisti al models of the relationship between the sour e text units in a do ument and the target text units to be used in the summary of that do ument. This model des ribes both the order and likelihood of appearan e of the tokens in the target do uments in the ontext of ertain tokens in the sour e and a partial target do ument. Both these models { for ontent sele tion and surfa e realization { learnt from the training set, are used to oonstrain ea h other during the sear h in the HEADLINE

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999